Decision tree based text-to-phoneme mapping for speech recognition

نویسندگان

Janne Suontausta

Juha Häkkinen

چکیده

In many embedded speech recognition systems, the phonetic transcriptions of the vocabulary items, i.e., the lexicons, cannot be stored to the device beforehand. A text-to-phoneme mapping functionality is hence needed to create the transcriptions from plain text. Several approaches have been evaluated in the literature. In this paper, a decision tree based text-to-phoneme mapping is studied. A decision tree is trained for each letter according to information theoretic criteria on a pronunciation dictionary that contains the phoneme transcriptions for a large number of words. Context information is utilized to create the mapping. In our experiments, the mapping was constructed on the Carnegie Mellon pronunciation dictionary [1]. The phoneme accuracy of the most effective mapping was 99% on the training set and 91% on the test set of the pronunciation dictionary. The mapping was also implemented in a speaker independent isolated word recognition system. The recognition rates in the clean and in the car noise test environment were close to the baseline recognition rates obtained with the correct transcriptions, when the training lexicon contained the test vocabulary. When the test vocabulary differed significantly from the training vocabulary, the mapping performed below our expectations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted entropy training for the decision tree based text-to-phoneme mapping

The pronunciation model providing the mapping from the written form of words to their pronunciations is called the text-to-phoneme (TTP) mapping. Such a mapping is commonly used in automatic speech recognition (ASR) as well as in text-to-speech (TTS) applications. Rule based TTP mappings can be derived for structured languages, such as Finnish and Japanese. Data-driven TTP mappings are usually ...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Neural networks for text-to-speech phoneme recognition

This paper presents two different artificial neural network approaches for phoneme recognition for text-to-speech applications: Staged Backpropagation Neural Networks and SelfOrganizing Maps. Several current commercial approaches rely on an exhaustive dictionary approach for text-to-phoneme conversion. Applying neural networks for phoneme mapping for text-to-speech conversion creates a fast dis...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Speech Recognition Using Monophone and Triphone Based Continuous Density Hidden Markov Models

Speech Recognition is a process of transcribing speech to text. Phoneme based modeling is used where in each phoneme is represented by Continuous Density Hidden Markov Model. Mel Frequency Cepstral Coefficients (MFCC) are extracted from speech signal, delta and double-delta features representing the temporal rate of change of features are added which considerably improves the recognition accura...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Decision tree based text-to-phoneme mapping for speech recognition

نویسندگان

چکیده

منابع مشابه

Weighted entropy training for the decision tree based text-to-phoneme mapping

Allophone-based acoustic modeling for Persian phoneme recognition

Neural networks for text-to-speech phoneme recognition

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Speech Recognition Using Monophone and Triphone Based Continuous Density Hidden Markov Models

عنوان ژورنال:

اشتراک گذاری